Binaural Audio-Visual Localization

نویسندگان

چکیده

Localizing sound sources in a visual scene has many important applications and quite few traditional or learning-based methods have been proposed for this task. Humans the ability to roughly localize within beyond range of vision using their binaural system. However most existing use monaural audio, instead as modality help localization. In addition, prior works usually form object-level bounding boxes images videos evaluate localization accuracy by examining overlap between ground-truth predicted boxes. This is too rough since real source often only part an object. paper, we propose deep learning method pixel-level leveraging both recordings corresponding videos. Specifically, design novel Binaural Audio-Visual Network (BAVNet), which concurrently extracts integrates features from We also point-annotation strategy construct ground truth network training performance evaluation. Experimental results on Fair-Play YT-Music datasets demonstrate effectiveness show that audio can greatly improve localizing sources, especially when quality information limited.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An information based feedback control for audio-motor binaural localization

In static scenarios, binaural sound localization is fundamentally limited by front-back ambiguity and distance non-observability. Over the past few years, “active” schemes have been shown to overcome these shortcomings, by combining spatial binaural cues with the motor commands of the sensor. In this context, given a Gaussian prior on the relative position to a source, this paper determines an ...

متن کامل

Binaural Audio Project

The aim of this project is to expand on the techniques and knowledge used in binaural audio. This includes main characteristics: Interaural Time Difference (ITD), Interaural Level Difference (ILD) and Head Related Transfer Function (HRTF). Recordings were made at the University’s anechoic chamber with a dummy head and binaural microphones to test the effect of turning the head in front of a spe...

متن کامل

Audio-Visual Clustering for Multiple Speaker Localization

We address the issue of identifying and localizing individuals in a scene that contains several people engaged in conversation. We use a human-like configuration of sensors (binaural and binocular) to gather both auditory and visual observations. We show that the identification and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. W...

متن کامل

Audio-Visual Event Localization in Unconstrained Videos

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment. We collect an Audio-Visual Event (AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality l...

متن کامل

Information-Driven Active Audio-Visual Source Localization

We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore ob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i4.16403